Speech processing system



Dec..9, 1969 J. l.. STEWART SPEECH PROCESSING SYSTEM 2 Sheets-Sheet 2Filed April 22, 1966 FfQaEA/cy A400 scm E) INVENTOR.

Jd//A/ L. .STEWART T'OPA/EKS 3,483,325 SPEECH PROCESSING SYSTEM .lohn L.Stewart, Menlo Pai-k, Calif., assignor to Santa Rita Technology, Inc.,Menlo Park, Calif., a corporation of Arizona Filed Apr. 22, 1966, Ser.No. 544,531

Int. Cl'. H04m 1/00; H04b 1/66; G01r 23/16 U.S. Cl. 179-1 10 ClaimsABSTRACT F THE DISCLOSURE A speech processing system that develops froman electrical signal corresponding in frequency and magnitude to thespeech, a first constant bandwidth of comparatively low frequency and asecond variable bandwidth of a frequency somewhat higher than that ofthe constant bandwidth. A system for controlling or tuning the variablebandwidth filter in proportion to the frequency distributioncharacteristics of the input signal. The tuning circuit is arranged togenerate a tuning signal that is weighted according to the frequencycontent of the signal so as to preserve and extract only essentialfrequency components of the signal.

This invention relates generally to speech processing systems, andpertains more particularly to such a system employing bandwidthcompression techniques.

As a preface to a detailed description of my invention, it can beexplained that all of the frequency components that exist at any instantin a speech signal are not equally important, some being unnecessary.Most of the intelligibility and naturalness in speech can be retained bypassing only two relatively narrow bands of components. One of thesebands in accordance with the present invention is fixed to cover therange of from approximately 300 to 600 cycles per second, and the othermoves with the speech signal in the range from approximately 600 -to5,000 cycles per second, this latter band or more properly sub-bandpossessing a constant bandwidth to center frequency ratio ofapproximately unity, although this specific ratio is not critical.

With the above in mind, one object of the present invention is toprovide a speech processing system which provides speech output that isgenerally intelligible to the human ear, even though only the relativelynarrow bands of frequency components are utilized to either constituteor reconstitute the speech. More specifically, the invention has for thedual aim the provision of a system that will find utility in eitherprocessing a voice input signal to derive a voice output signal having agenerally acceptable level of intelligibility or to synthesize a broadband noise input signal to come out witli speech having the same desiredcharacteristics as when processing an actual voice input signal. Statedsomewhat differently, my invention basically'involves the passing offrequencies in a particular fixed band toward the lower end of the audiorange and to superimpose the signal passed in such a manner with acontrol signal that is predicated upon certain frequency bands that arevariable and which are controlled in a tuned fashion so as to literallysuperimpose the controlled signal onto the more basic signal and stillobtain a speech output signal that can be understood.

While my invention has the broader object which has been mentionedabove, a more specific object is to provide an output voice signal thatcontains a certain frequency emphasis so as to improve hearing in somecases of sensori-neural deafness by shifting the normal speechfrequencies to a somewhat lower value before the voice signal reachesthe listeners ear. In a somewhat reverse situation, althoughfunctionally similar to improving the hearing in the above-alluded-todeafness situation, it is 3,483,325 Patented Dec. 9, 1969 possible tocounteract certain distortions of speech wherel the hearing is normal,such a case existing where either there is some impairment in thespeakers manner of speaking or he is located in an abnormal environmentsuch as when the speaker is under water and is breathing a mixture ofhelium and oxygen which has been experienced by persons attempting tolisten to undersea divers.

Thus, quite briefly, my invention involves the passing of frequencieswithin a certain predetermined band toward the lower end of the audiospectrum, more specifically, within the bandwidth of approximately from300 to 600 cycles per second. Via a tunable bandwidth filter operatingin the 600 to 5,000 cycles per second range, another band within suchrange is utilized in accordance withl a tuning control signal so as topass the necessary frequency components. Through the agency of an adder,the signals from both paths are combined and then delivered to asuitable output device which expresses the derived signal in speechform. Since the system that has been briefly described can receive anelectrical input signal which has been transduced from a voice or canreceive a broad band noise input signal, suitable provision can be madefor either processing one or the other of such signals in order toobtain the desired output voiced or synthetic speech signal. It can bepointed out at this time, though, that when processing a voice inputsignal, a vsomewhat more simplified circuit configuration can beutilized than when synthesizing a desired signal to be used as theoutput speech signal. As will hereinafter become more readilyunderstood, use is made of a pattern centroid signal for controlling andselecting the appropriate subbandwidth that is to be added or combinedwith the more basic signal falling within a bandwidth toward the lowerend of the audio range. When synthesizing a signal, an area controlsignal is utilized and is multiplied with the signal obtained by addingor combining in order to get the speech output signal that is sought.

These and other objects and advantages of my invention, will more fullyappear from the following description, made in connection with theaccompanying drawings, lwherein like reference characters refer to thesame or similar parts throughout the several views and in which:

FIGURE l is a view, largely in block form, illustrating a circuitarrangement that can be employed for either speech sharpening or speechsynthesizing purposes;

FIGURE 2 is a gain curve showing the relative gain function for thevarious filters which are embodied in the analog ear analyzer shown inFIGURE 1, and

FIGURE 3 is a graphical representation of a set of e waveforms typicalof those utilized in FIGURE 1 and defining a spatial pattern which maybe continuous and which changes with time, the view three-dimensionallycharacterizing this pattern being in the form of a surface plottedagainst distance, intensity and time.

Since my speech processing system can be utilized either for speechsharpening or speech synthesizing purposes, it will be helpful todescribe the circuitry utilized in obtaining the simpler goal which isthe speech sharpening one. Therefore, it will be seen that my speechsharpening system includes an input device 20, such as a microphone ortape deck, serving as the means for delivering an appropriate electricalsignal that has been transduced or converted from the speech to besharpened. The electrical signal from the input device 20 is deliveredthrough a switch 22 to a first filter 24 having a fixed bandwidth fromapproximately 300 to cycles per second and therefore capable of passingrelatively low audio fre-- quencies. Also connected to the input device20 through the switch 22 is a second filter 26 having a variablebandwidth for passing audio frequencies above the fixed band- Width thathas been selected for the filter 24. More specifiartesana cally, thebandwidth for the filter 26 is from approximately 600 to 5,000 cyclesper second, and an appropriate sub-band within this relatively widefrequency range is shifted in accordance with certain characteristicsthat are to be imparted to the ultimate voice signal that will behereinafter referred to. Preferably, this sub-band has a constantbandwidth to center frequency ratio of approximately unity.

Actually, it is desirable to feed the output from the filter 24 throughan adjustable gain control 30 so that the contribution from the filter24 may he augmented or suppressed according to the desires of the user.The now sharpened outputs are fed through switches 28 and 32 that areclosed so as to bypass multiplier circuitry hereinafter referred to inconjunction with the synthesizing action that is possible with mysystem. The two signals via` the switches 28, 32 and the signals fromthe multipliers 60, 62 (hereinafter more specifically referred to) arethen applied directly to an adder 34 and an output device 36 which canbe a set of earphones or a loudspeaker if the voice output signal is tobe heard directly by the listener, or the output device can be arecorder of some type if the processed voice signal is to be used later.

From the information presented above, it is readily apparent that themanner in which the filter 26 is tuned has not as yet been dealt with.Therefore, it will be observed that the same voice input signal from thedevice 20 is delivered to a frequency spreader or analog ear analyzerdesignated generally by the reference numeral 38. In the depictedsituation, the analyzer 38 comprises a common filter 39 and a pluralityor bank of low-pass filters 40-1, 40-2 40-12; while 12 such low-passfilters have been mentioned, this number can be either increased ordecreased as circumstances require and without any essential changetaking place with respect to the practicing of the instant invention.Also included in the analyzer 38 is a group of amplifiers 42-1 through42-12, there being one such amplifier after each filter t-1 through40-12 in order to provide a prescribed gain of six db. It may in someinstances be advantageous to combine functions of filtering andamplifying such that the two operations cannot be separately identified.Especially cited in this regard is that, through use of feedbackamplifiers, the filters can be made to perform as if both inductance andcapacitance are present but without actually having any inductors. It isto be recognized that alternative but equivalent filtering schemes existwhich in one case yields a purely passiveresistance-inductance-capacitance circuit. It will be helpful at thistime to refer to FIGURE 2 where the gain curve scheme is graphicallydisplayed. The particular amplified outputs of the analyzer 38 have beenlabeled, respectively, with the reference numerals 1 through 12. Alsosuperimposed upon the graph constituting FIGURE 2 is the characteristicof the common filter 39, this filter suggestively having a break pointat 2,500 cycles per second below which the slope of the curve is 18 dbper octave. It will be recognized that the abscissa represents frequencyon a logarithm. scale and the ordinate represents relative gain indecibels. From the nature of these several curves, system variations areobvious. Part of the common filter characteristic can be associated witheach filter output as can accumulated gain values. For example, adifferentiator can be associated with each output section so that theslope of the common filter characteristic below the break point becomesl2 db per octave, which is in accord with the auditory thresholdcharacteristic of human hearing at low frequencies. The gain value of 6db associated with each filter is predicated on the use of 12 filtersgiving a total accumulated gain of 72 db for the 12th stage. If the samefrequency range is covered using, say, 24 filters, then maintaining thesame accumulated gain value requires that each filter be associated witha gain of 3 db instead of 6 db. The intent of' FIGURE 2 is in part toindicate gain values. But also it is meant to show that bandwidths ofthe separate low-pass filters bear a constant ratio, one to the next.The proper values for gains and bandwidths using any number of sectionsother than the l2 explicitly described herein should thus be obvious.

It can be explained at this time that an analyzer very similar to whathas been generally described herein has been more fully described in myPatent No. 3,387,093 for Speech Bandwidth Compression System, andreference may be made thereto for a more complete understanding of theanalyzer 38, although the analyzer or frequency spreader described in mypatent has a lesser' number of filter sections and also does not havethe six db gain amplifiers that have been included in the analyzer 38.Attention is directed also to another of my co-pending patentapplications, namely, Method and System of Analyzing the Inner Ear,which was filed on July l2, 1965, Ser. No. 471,074.

Inasmuch as the various filters 404 through 40-12 are extremelyoverlapping7 as pointed out in my co-pending application, these variousfilters may be thought of as making a type of weighted spectrum analysisof the speech or voice delivered from the input device 20. The out-put,however, from each filter 40-1 through 40-12 is rectified by a rectifier44 and the rectified signal in each situation is fed to various filters46 each having a bandwidth of approximately 30 cycles per second.

The outputs from the filters 46 are in the form of a set of relativelyslowly varying waveforms or voltages e1, e2 e212 (or whatever number offilters has been selected for the analyzer 38) in part representingtemporal variations of energies of individual spectral bands. From FIG-URE 3, it will be appreciated that the system or group of signals e1, e2em constitutes the bandwidth compressed speech to the extent that theindividual es are slowly varying and that they are similar to oneanother. However, these slowly varying voltage signals el, cl2 arerepresentative of intelligence that has been obtained from thetransduced voice signal. It is this information that is furtherprocessed in order to tune the filter 26 properly so as' to reflect inthe signal forwarded to the adder from the filter 26 the informationthat is needed in providing an intelligible voice output at .34,

Therefore, attention is now drawn to the presence of a pattern centroidextractor 48 to which the various filters 46 are connected, there beingterminals 50-1 through 50-12 for introducing the slowly varying Iesignals into the extractor, Although several suitable pattern centroidextractors have been shown and described in my patent, it will be ofassistance to explain the processing action that takes place in theextractor 48, although resort can be made to my said patent if detailedinformation is desired. Also, it should be of help to refer to FIGURE 3herein presented which figure shows the various e signals that are fedto the pattern centroid extractor 48 via the terminals 50-1 through50-12.

These various rectified and filtered voltages e1-612 are added togetherin resistor/amplifier linear summing arrays contained within theextractor 48 to get two voltages of the form and these voltages EA andEB are subtracted in one situation and added in another with a furtherdivisional operation performed thereon to get the final tuning signalthat is applied to the tunable bandwidth filter 26. The quantity EA canbe considered a combination of' voltage Signals 6 in an. increasingweighted fashion, and the quantity EB can be considered a combination ofvoltage signals en in a decreasing weighted fashion. The signal is t LEAMEB r i' f1 iiniung signal EA +En Since l have stated that referencecan be made to my Patent No. 3,387,093 titled Speech BandwidthCompression System for a more comprehensive understanding of the roleperformed by the extractor 48, it will simplify matters to state thatthe extractor in effect is providing a signal representative of thecenter of gravity of the area under the envelope which is labeled 51. Asthis center of gravity shifts, a change iii the control signal impressedon the tunable ybandwidth filter 26 can be modified.A so as to shift thesub-band within the bandwidth for the filter 26. Consequently, thesignal taken from the analyzer 33 through the medium of the extractor 48is placed or converted into a usable form that controls the tuning ofthe filter 26 in a fashion so that intelligible information is forwardedto one input of the adder 34. Since the adder 34 receives-its othersignal directly from the adjustable gain control, the outputfrom theadder 34 is in a speech sharpened form that can be used directly orrecorded for subsequent use at the output device 36.

Having described aquisition of centroid in the foregoing, it is nowstated that all of the 12 signals may not be used for this purpose. Itwas previously stated that the tunable filter covers the range above 600cycles per second which range is not properly represented in part byvoltages e9- e12. In the actual process of centroid extraction, thesevoltages are attenuated or even eliminated from the computation for EAand EBIn the case of complete'elimination, there results where it is tobe understood that a situation intermediate between total usage and"total elimination of voltages e9- e12 may be most appropriate.

Having given the above-presented description, the manner in which myspeech processing system is employed for synthesizing purposes will bemore easily understood. It will be observed that the switch 22 has twopositions, and when in the phantom outline position, it is connecteddirectly to a broad bandnoise input source 54. Ifdesired, the source 54,which provides white gaussian noise, can be augmented with a buzzingsound a buzz source 56 so that the human larynx can be emulated.

However, since in this situation, which is currently being described, itis desired to synthesize or build up a voice output at 36, tlie broadband noise signal from the source 54 is not forwarded to the analyzer38. Instead, the same voice input signal as in the case of speechsharpening is employed. In this case, the desire is to synthesize speechfrom ordinary voice using control signals which are relatively simplecompared to the speech waveform itself.

As with the speech sharpening procedure, the filters 46 supply slowlyvarying e voltages which are impressed in the same manner as heretoforeon the terminals 501 through 50-12 of the extractor 48. Hence, thetuning control signal delivered via the line S2 to the filter 26 is usedin the same fashion as was done when using the voice input device Notonly is a tuning control signal derived in the synthesizing of speech,as Vdescribed above, but two pattern area control signals are alsoobtained from the slowly varying e voltages through addition ofcombinations of the slowly varying voltages e1-e12. These are designatedA1 and A2 composed as which may be modified somewhat through inclusionof some of e9 in A1 and/ or reduction of various of the component in A2..Means for acquiring A1 and 2 from signals e1-e12 are believed to be sowell known that specific descriptions are not needed. It will berecognized that A1 and A2 are representative of partial areas under theenvelope 51 of FIGURE 3, A1 being representative of the partial areataken over a distance related to the frequency range passed by filters-1 to 40-8, about 800 to 20,000

cycles per second, and A2 being representative of the area taken over adistance related to the frequency range passed by filters 40-9 to 40-12,which is about 300-600 cycles per second, These partial areas aremeasures of the average intensity of the signals in the associatedfrequency ranges. The area' measure A2 i`s directed to one input of thepreviously mentioned multiplier 62 which has its other input connectedto the tunable filter 26. In a similar manner, the area measure A1 isdirected to the multiplier 60, the multiplier 60 having its other inputconnected to the 300-600 c.p.s. filter 24 via the adjustable gaincontrol 30. Hence, instead of the lters 24 and 26 delivering theiroutputs directlyto the adder 34, as was done when obtaining a speechsharpening action, when synthesizing the filter outputs are delivered tomultipliers 60 and 62 to be operated upon in accordance with the areameasures A1 and A2 and the 'resulting products are fed to the adder 34and thence to the output device 36 in order to provide a constituted orsynthesized speech signal that contains a suliicient number of frequencycomponents of proper relative intensity as to have a generallyintelligible level as far as the human ear is concerned.

Owing to thefact that the operation has been given in each instance, itis believed unnecessary to have a separate operational description atthis particular time. It will be appreciated, through, that when eithersharpening speech or synthesizing speech, here is produced anappropriate tuning" signal that carries sufiicient information orintelligence therein so that when applied to the tunable bandwidth lter.26 an appropriate sub-band lof frequency is transmitted to the adder 34for combining with? the signal forwarded from the fixed bandwidth lter24 through the intermediary of the adjustable gain control 30. It` haspreviously been-explained that the gain control 30 provides individualadjustment to suit the listening tastes of the individual when thespeech signal is transduced at the output device 36.

It will, of course, be understood that various changes may be made inthe form, details, arrangements and proportions of the parts withoutdeparting from the scope of my invention as set forth in the appendedclaims.

I claim:

1. A speech processing system comprising means for providing an inputelectrical signal containing various frequencies within the audio band,first filter means connected to said input means having a fixedbandwidth for passing relatively low audio frequencies, second filtermeans connected to said input means having a variable bandwidth forpassing audio frequencies above said fixed bandwidth, means forcombining the respective outputs from said first and second filtermeans, means connected to said combining means for providing a speechsignal, and means for tuning said second filter means to selectedfrequencies so that sai-d speech signal is generally intelligible to thehuman ear, said tuning means including means for forming a plurality ofdiscrete Voltage signals en that correspond to the magnitude offrequency selected portions of the input electrical signal, firstcombining means for combining the discrete voltage signals in increasingweighted fashion to produce a composite signal EA, second combiningmeans for combining the discrete voltage signals in decreasing weightedfashion to produce a composite signal EB, and means constituting theoutput of said tuning means for generating the voltage function 2. Aspeech processing system as defined in claim 1 in which sai-d firstfilter means has a fixed bandwidth from approximately 300 to about 600cycles per second, and said second filter means has a tunable sub-bandwithin a bandwidth of from approximately 600 to about 5,000 cycles persecond.

3. A speech processing system as defined in claim 1 in which said inputproviding means provides a signal transduced. from speech sound.

7 4. A speech processing system as defined in claim 3 in which saidfirst filter means has a fixed bandwidth .from approximately 300 toabout 600 cycles per second, and Said second filter means has a tunablesub-band Within a bandwidth of from approximately 600 to about, 5,000cycles per second, said sub-band having a center frequency to bandwidthratio of approximately unity.

S. A speech processing system as defined in claim 4 in which said systemfurther includes an adjustable gain control connected between said firstfilter means and said combining means for selectively varying thecontribution from said first filter means to said speech signal.

6, A speech processing system as defined in claim 1 in which said inputmeans provides a broad band noise signal, said system further includingmeans `for supplying to said analyzer a signal. transduced from speechsound.

7. A speech processing system as defined in claim 6 in which said first1iilter means has a fixed bandwidth from approximately 300 to about 600cycles per second, and said second filter means has a tunable sub-bandwithin a bandwidth of from approximately 600 to about 5,000 cycles persecond.

8. A speech processing system as defined in claim 7, further includingmeans connected between each of said first and second filter means andsaid combining means for adjusting the relative average intensity of therespective outputs from said first and second filter means.

9. A speech processing system as defined in claim 8, in which each ofsaid intensity adjusting means comprises a multiplier for providing anoutput signal which is the product of two input signals, the respectiveoutputs of each of said first and second filter means being connected toone input of the corresponding multiplier, said system further includingmeans responsive to said analyzer for providing a first area controlsignal related to the average intensity of frequency components of saidtransduced speech signal in the range of about 300 to about 600 cyclesper second and a second area control signal related to the averageintensity of frequency components of said transduced speech signal aboveabout 800 cycles per second, means for supplying said first area controlsignal to the other input of the multiplier connected to said firstfilter means, and means for supplying said second area control signal tothe other input of the multiplier connected to said second filter means.

10. A speech processing system as defined in claim 9 in which said inputmeans includes a source of buzzing soundg References Cited UNITED STATESPATENTS 3,431,356 3/1969 Copel.

2,906,955 9/1956 Edson et alc 3,078,345 2/1963 Campanella et al.179-1555 3,176,073 3/1965 Samuelson et al,

3,376,386 5/1968 Pant,

3,387,093 6/1968 Stewart.

3,394,228 7/1968 Flanagan et al.

KATHLEEN H. CLAFFY, Primary Examiner ARTHUR A. MCGILL, AssistantExaminer U.S. Cl. XLRc

